Snowpack variability in western North America over the common era

Models and Observations

Nick Gauthier

Last knit on: 2019-11-06

Analysis of simulated and observed snow water equivalent (SWE) dynamics in western North America over the Common Era.

Setup

Import the packages required for this analysis.

library(raster) # processing raster data
library(tidyverse) # data manipulation and visualization
library(mgcv) # flexible nonlinear regression models
library(furrr) # parallel processing
library(gganimate) # animated gifs
library(ggridges) # ridge plots
library(broom) # tidying model objects

Define a study area to constrain all computaitons.

bbox <- extent(c(-125, -104, 33, 50))

Observations

Import the snow observation data from https://nsidc.org/data/nsidc-0719.1 What’s up with this warning message? long_name=CRS definition spatial_ref=GEOGCS[“NAD83”,DATUM[“North_American_Datum_1983”,SPHEROID[“GRS 1980”,6378137,298.257222101,AUTHORITY[“EPSG”,“7019”]],AUTHORITY[“EPSG”,“6269”]],PRIMEM[“Greenwich”,0,AUTHORITY[“EPSG”,“8901”]],UNIT[“degree”,0.01745329251994328,AUTHORITY[“EPSG”,“9122”]],AUTHORITY[“EPSG”,“4269”]] GeoTransform=-125.0208 0.04166662697178698 0 49.9375 0 -0.04166662697178698. First define a function to extract the map for April 1st from each annual file, then run it on the SWE and snow depth data.

preprocess <- function(x, var, flip = FALSE, regrid = FALSE) {
  maps <- brick(x, varname = var)

  indices <- getZ(maps) %>% # find the index for April 1st
    str_detect('-04-01') %>% 
    which()
  
  subset(maps, indices) %>%
    {if(flip) rotate(.) else .} %>%
    crop(bbox) %>%
    {if(regrid) aggregate(., fact = 2, na.rm = TRUE) else .} # resample to lower res to speed up analysis
}
plan(multisession) # setup parallel processing

# import SWE data
swe_obs <- list.files('data', pattern = 'SWE_Depth', full.names = TRUE) %>%
  future_map(preprocess, var = 'SWE', regrid = TRUE) %>%
  brick()

# import depth data
depth_obs <- list.files('data', pattern = 'SWE_Depth', full.names = TRUE) %>%
  future_map(preprocess, var = 'DEPTH', regrid = TRUE) %>%
  brick()

Import SRTM elevation and resample to the resolution of the observations. Also calculate the area of each grid cell.

elev <- raster('~/gdrive/Data/SRTM_1km.tif') %>%
  crop(bbox) %>%
  resample(swe_obs) %>%
  brick(., area(.))

Turn all the rasters into data frames and join.

swe_dat <- swe_obs %>%
  as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
  mutate(year = round(parse_number(layer))) %>%
  rename(SWE = value) %>%
  select(-layer)

depth_dat <- depth_obs %>%
  as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
  mutate(year = round(parse_number(layer))) %>%
  rename(DEPTH = value) %>%
  select(-layer)

elev_dat <- elev %>%
  as.data.frame(xy = TRUE, na.rm = TRUE) %>%
  rename(elev = SRTM_1km, area = layer)

snow_dat <- swe_dat %>%
  left_join(depth_dat, by = c('x', 'y', 'year')) %>%
  left_join(elev_dat, by = c('x', 'y'))

Look at how SWE has varied over time and space.

Some aggregate statistics. Is this the right way to aggregate?

snow_agg <- snow_dat %>%
  filter(SWE > 0) %>%
  group_by(year) %>%
  summarise(SWE = sum(SWE * area), # units in megaliters (?)
            area = sum(area))

How has total SWE varied over the observational period?

What about total area covered?

Simulations

cesm_h2osno <- preprocess('~/Downloads/b.e11.BLMTRC5CN.f19_g16.001.clm2.h1.H2OSNO.08500101-18491231.nc',
                          var = 'H2OSNO',
                          flip = TRUE)
## Loading required namespace: ncdf4
cesm_h2osno_ext <- preprocess('~/Downloads/b.e11.BLMTRC5CN.f19_g16.001.clm2.h1.H2OSNO.18500101-20051231.nc',
                          var = 'H2OSNO',
                          flip = TRUE)

cesm_areas <- area(cesm_h2osno) %>%
  as.data.frame(xy = TRUE, na.rm = TRUE) %>%
  rename(area = layer)

still need to account for snow fraction of grid cell? and land fraction too?

cesm_dat <- c(cesm_h2osno, cesm_h2osno_ext) %>%
  brick() %>%
  as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
  mutate(year = str_sub(layer, 2) %>% parse_number() %>% round) %>%
  rename(SWE = value) %>%
  select(-layer) %>%
  left_join(cesm_areas, by = c('x', 'y'))

cesm_agg <- cesm_dat %>%
  group_by(year) %>%
  summarise(SWE = sum(SWE * area, na.rm = TRUE)) %>%
  mutate(avg = zoo::rollmean(SWE, k = 100, na.pad = TRUE))
## Warning: Removed 99 rows containing missing values (geom_path).

So there’s a clear bias here. The mean of the observations is well higher than it should be. Looks like the simulation underpredicts snow – likely because of all the small-scale topographic influences that CESM can’t resolve.

## Picking joint bandwidth of 3090000

Let’s compare the observations and simulation directly.

What’s the climatic mean April 1st SWE for the observations and simulations?

To what extent is this mistmatch due to bias or scale effects? Test by resampling the observations to the CESM LME grid.

This comparison suggests that CESM is underpredicting SWE because of the simplified topography. Check this by looking at the underlying topography boundary files for that simulation.

surface_geop <- raster('data/consistent-topo-fv1.9x2.5_c130424.nc', 
                       var = 'PHIS') %>%
  rotate() %>%
  crop(bbox, snap = 'out') %>%
  `/`(9.80665) %>%
  as.data.frame(na.rm = TRUE, xy = TRUE) 

landfrac <- raster('data/consistent-topo-fv1.9x2.5_c130424.nc', 
                   var = 'LANDFRAC') %>%
  rotate() %>%
  crop(bbox, snap = 'out')

Might be interesting to look at snow centroid date too down the road. Spatial patterns notwithstanding, maybe the aggregate temporal patterns are still recoverable?

Empirical Orthogonal Functions

Now run a principal components analysis on the observed SPEI data. Much of the code that follows is adapted from the wql and sinkr packages.

obs_pca <- swe_dat %>%
  group_by(x, y) %>%
  mutate(test = all(SWE == 0)) %>%
  filter(test == FALSE) %>%
  ungroup() %>%
  select(-test)%>%
  spread(year, SWE) %>%
  select(-x, -y) %>%
  t() %>%
  prcomp(scale. = TRUE)

sim_pca <- cesm_dat %>%
  filter(year > 1900) %>%
  group_by(x, y) %>%
  mutate(test = all(SWE == 0)) %>%
  filter(test == FALSE) %>%
  ungroup() %>%
  select(-test) %>%
  spread(year, SWE) %>%
  select(-x, -y, -area) %>%
  t() %>%
  prcomp(scale. = TRUE)

eofs_obs <- obs_pca %>% # calculate unrotated EOFs
  tidy(matrix = 'variables') %>%
  filter(PC <= 2) %>%
  left_join(obs_eigs[1:2], by = 'PC') %>%
  mutate(eof = value * std.dev,
         PC = as.character(PC)) %>%
  select(-std.dev,-value) 

eofs_sim <- sim_pca %>% # calculate unrotated EOFs
  tidy(matrix = 'variables') %>%
  filter(PC <= 4) %>%
  left_join(sim_eigs[1:2], by = 'PC') %>%
  mutate(eof = value * std.dev,
         PC = as.character(PC)) %>%
  select(-std.dev,-value) 
## Joining, by = "column"

## Joining, by = "column"

Other Analyses

Trend Analysis

m1 <- bam(SWE ~ te(x, y, by = year, k = 20), data = snow_dat)
summary(m1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## SWE ~ te(x, y, by = year, k = 20)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1056.66      19.45   54.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df    F p-value    
## te(x,y):year 382  383.8 3717  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.475   Deviance explained = 47.5%
## fREML = 9.8714e+06  Scale est. = 16088     n = 1576188

Looks good down here, but these are in absolute numbers. Maybe rescale to look at relative change?

snow_dat %>%
  filter(year == 1999) %>%
  mutate(pred = predict(m1, ., type = 'response')) %>%
  ggplot(aes(x, y)) +
  geom_raster(aes(fill = pred)) +
  scale_fill_distiller(palette = 'RdBu', direction = 1, limits = c(-1000, 1000))

Snow Density

First let’s model the relationship between snow depth and snow water equivalent (i.e. snow density) in the observations.

snow_dat %>%
  ggplot(aes(DEPTH, SWE)) +
  geom_point()

m2 <- bam(SWE ~ s(DEPTH), data = snow_dat, discrete = TRUE)
plot(m2)

summary(m2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## SWE ~ s(DEPTH)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1165.0027     0.3417    3409   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df       F p-value    
## s(DEPTH) 8.824  8.981 8550156  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.98   Deviance explained =   98%
## fREML = 7.2993e+06  Scale est. = 616.49    n = 1576188

Temporal Autocorrelation

knitr::knit_exit()